Behavior of Consolidated Trees when using Resampling Techniques
نویسندگان
چکیده
Many machine learning areas use subsampling techniques with different objectives: reducing the size of the training set, equilibrate the class imbalance or non-uniform cost error, etc. Subsampling affects severely to the behavior of classification algorithms. Decision trees induced from different subsamples of the same data set are very different in accuracy and structure. This affects the explanation of the classification; very important in some domains. This paper presents a new methodology for building decision trees. The final classifier is a single decision tree, so that it maintains the explaining capacity of the classification. A comparison in error and structural stability of our algorithm and the C4.5 algorithm is done. The decision trees generated using the new algorithm, achieve smaller error rates and structurally more steady trees than C4.5 when using subsampling techniques.
منابع مشابه
Consolidated Trees: An Analysis of Structural Convergence
When different subsamples of the same data set are used to induce classification trees, the structure of the built classifiers is very different. The stability of the structure of the tree is of capital importance in many domains, such as illness diagnosis, fraud detection in different fields, customer’s behaviour analysis (marketing), etc, where comprehensibility of the classifier is necessary...
متن کاملThe Effect of the Used Resampling Technique and Number of Samples in Consolidated Trees’ Construction Algorithm
In many pattern recognition problems, the explanation of the made classification becomes as important as the good performance of the classifier related to its discriminating capacity. For this kind of problems we can use Consolidated Trees ́ Construction (CTC) algorithm which uses several subsamples to build a single tree. This paper presents a wide analysis of the behavior of CTC algorithm for ...
متن کاملA new algorithm to build consolidated trees: study of the error rate and steadiness
This paper presents a new methodology for building decision trees, Consolidated Trees Construction algorithm, that improves the behavior of C4.5. It reduces the error and the complexity of the induced trees, being the differences in the complexity statistically significant. The advantage of this methodology in respect to other techniques such as bagging, boosting, etc. is that the final classif...
متن کاملConsolidated Tree Construction Algorithm: Structurally Steady Trees
This paper presents a new methodology for building decision trees or classification trees (Consolidated Trees Construction algorithm) that faces up the problem of unsteadiness appearing in the paradigm when small variations in the training set happen. As a consequence, the understanding of the made classification is not lost, making this technique different from techniques such as bagging and b...
متن کاملSelecting Multiway Splits in Decision Trees
Decision trees in which numeric attributes are split several ways are more comprehensible than the usual binary trees because attributes rarely appear more than once in any path from root to leaf. There are efficient algorithms for finding the optimal multiway split for a numeric attribute, given the number of intervals in which it is to be divided. The problem we tackle is how to choose this n...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004